Jack Davis
2026-01-15
Directly from https://viso.ai/applications/computer-vision-in-sports/
List of visual AI applications in sports
Object detection is the recognition of where an object of interest (e.g., players, ball/puck, nets, boundary lines) is on a 2D image, and a bounding box around them. This is an extension of the OCR (optical character recognition) that we talked about last term with the card valuation.
In sports analytics, we often don’t just detect someone as “person”, but as a certain player, usually using the number on the back of the uniform. Additional work is done in video processing so that detection of a particular player is continuous even when their number is not showing from the location of the camera. (e.g., using a player’s previously known position and velocity to infer that a person must be player X because nobody else could be at a certain location at a certain time.)
Text and image from: https://www.baeldung.com/cs/pose-estimation
“The objective of Pose Estimation, a general problem in computer vision, is to identify the location and orientation of an item or human. In the case of human pose estimation, we typically accomplish this by estimating the locations of various key points like hands, heads, elbows, and so on. These key points in photos and videos are what our machine-learning models seek to track”
“In photos or videos, human pose estimation recognizes and categorizes the positions of human body components and joints. To represent and infer human body positions in 2D and 3D space, a model-based technique is typically used. One particular class of flexible objects includes people. Keypoints will be in different positions concerning others when we bend our arms or legs.”
“2D human pose estimation is estimating the 2D position or spatial placement of key points on the human body from visuals like photos and movies. It is simply the estimations of keypoint locations in 2D space concerning an image or video frame. For every key point, the model predicts an X and Y coordinate.”
Homography is the inference of how something on a 2D image maps to a 3D space. In most sports we have the advantage of having lines marking the dimensions of certain things on the ground. If we can use object detection on those lines, we can infer where the camera is and where it’s pointing. From there, we can infer where somebody is in 3D space.
See also: “Improving Robustness of Homography Estimation for Ice Rink Registration” by Jason Shang https://uwspace.uwaterloo.ca/items/95a2a33c-f681-4c5d-9982-e217a002e8be
Source: https://source.roboflow.com/dWImQnUpSGZKKsggZYU1tvb3g7m2/s9PP8EhtBZhpzvmEUMSs/original.jpg
Source: https://www.baeldung.com/wp-content/uploads/sites/4/2023/01/fig2.png
Here’s a few examples of how those key points (e.g., recognized body parts and joints of a player) are inferred that…
…the leg bone’s connected to the hip bone!
Photo from: https://sigmoidal.ai/
Photo from: https://learnopencv.com/
(OpenCV is a C++ library, that has both a Python wrapper and an R
wrapper)
In sports analytics, pose estimation can be used to identify what action is happening (e.gs., a bump, set, or spike in volleyball, or a slap shot, wrist shot, or pass in hockey).
It can also be used to identify baseball or cricket pitching/bowling approaches to determine what throw it is, or at what point the type of throw can be identified by a batter.
Image source: https://objectways.com/
It can be used to infer which way a player is facing, which can determine their probable range of vision and what part of the field they can control.
It can be used to infer the load being placed on a player biomechanically to help predict future injury risk.
It is used in most Padel analytics tools on the market today. (Source: https://pub.mdpi-res.com/sensors/sensors-21-03368/article_deploy/html/images/sensors-21-03368-g004.png )
Here we are going to demonstrate object recognition for soccer games in Python.
Unfortunately, public support for computer vision tasks in R has
stalled with RopenCV around 2017, so we’re going to use the
ultralytics package in Python instead.
Source: https://docs.ultralytics.com/tasks/pose/#models
Install ultralytics first ( https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html#install-in-tool-window ). If you’re using PyCharm, look for “python packages” in the lower left.
YOLO is an image recognition model that has already been pretrained for our use.
“The YOLO pose dataset format can be found in detail in the Dataset
Guide. To convert your existing dataset from other formats (like COCO
etc.) to YOLO format, please use the JSON2YOLO tool by
Ultralytics.”
To train a model, first we can load one.
#model = YOLO("yolo11n-pose.yaml") # build a new model from YAML
#model = YOLO("yolo11n-pose.pt") # load a pretrained model (recommended for training)
model = YOLO("yolo11n-pose.yaml").load("yolo11n-pose.pt") # build from YAML and transfer weightsThen, if we wish, we can train it further on a particular dataset. For example, here is the COCO dataset. https://cocodataset.org/#keypoints-2018 , from the COCO 2018 Keypoint Detection Task. (COCO stands for “Common Objects in COntext”)
Next, we can validate the model by getting its metrics.
# Get the metrics as an object
metrics = model.val() # no arguments needed, dataset and settings rememberedThis returns a bunch of floating point metrics, which are explained at https://docs.ultralytics.com/guides/yolo-performance-metrics/
Notably: “Average Precision (AP): AP computes the area under the precision-recall curve, providing a single value that encapsulates the model’s precision and recall performance.”
and Mean Average Precision (mAP)
https://www.ultralytics.com/glossary/mean-average-precision-map
“mAP at 50: This metric considers a prediction correct if it overlaps with the ground truth by at least 50%.”mAP at 50-95: Popularized by the COCO dataset, this is the modern gold standard. It averages the mAP calculated at steps of 0.05 from IoU 0.50 to 0.95. This rewards models that not only find the object but locate it with extreme pixel-level accuracy, a key feature of Ultralytics YOLO11.”
Now to see the whole thing in action
# Load a model
model2 = YOLO("yolo11n.pt") # pretrained YOLO11n model
# Run batched inference on a list of images
results3 = model2(["image1.jpg", "image2.jpg", "image3.jpg", "image4.jpg"]) # return a list of Results objectsRun batched inference on a list of images
0: 640x640 1 person, 1 sports ball, 34.8ms
1: 640x640 1 person, 34.8ms
2: 640x640 6 persons, 1 sports ball, 34.8ms
3: 640x640 1 person, 1 sports ball, 34.8ms
Speed: 2.4ms preprocess, 34.8ms inference, 0.3ms postprocess per image at shape (1, 3, 640, 640)
# Process results list
for result in results3:
boxes = result.boxes # Boxes object for bounding box outputs
masks = result.masks # Masks object for segmentation masks outputs
keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Probs object for classification outputs
obb = result.obb # Oriented boxes object for OBB outputs
result.show() # display to screen
result.save(filename="result.jpg") # save to disk